Artificial intelligence for telemedicine diabetic retinopathy screening: a review

Abstract Purpose This study aims to compare artificial intelligence (AI) systems applied in diabetic retinopathy (DR) teleophthalmology screening, currently deployed systems, fairness initiatives and the challenges for implementation. Methods The review included articles retrieved from PubMed/Medline/EMBASE literature search strategy regarding telemedicine, DR and AI. The screening criteria included human articles in English, Portuguese or Spanish and related to telemedicine and AI for DR screening. The author’s affiliations and the study’s population income group were classified according to the World Bank Country and Lending Groups. Results The literature search yielded a total of 132 articles, and nine were included after full-text assessment. The selected articles were published between 2004 and 2020 and were grouped as telemedicine systems, algorithms, economic analysis and image quality assessment. Four telemedicine systems that perform a quality assessment, image preprocessing and pathological screening were reviewed. A data and post-deployment bias assessment are not performed in any of the algorithms, and none of the studies evaluate the social impact implementations. There is a lack of representativeness in the reviewed articles, with most authors and target populations from high-income countries and no low-income country representation. Conclusions Telemedicine and AI hold great promise for augmenting decision-making in medical care, expanding patient access and enhancing cost-effectiveness. Economic studies and social science analysis are crucial to support the implementation of AI in teleophthalmology screening programs. Promoting fairness and generalizability in automated systems combined with telemedicine screening programs is not straightforward. Improving data representativeness, reducing biases and promoting equity in deployment and post-deployment studies are all critical steps in model development.


Background
the world Health Organization defines telehealth and telemedicine as the use of information and communication technologies by healthcare professionals to provide diagnosis, treatment and prevent diseases and injuries to patients [1].telehealth has a clinical support premise that overcomes geographic distances through information and communication technology for the trade of health guidance [2].
the first telemedicine initiative took place more than one century ago, when telephone use started [3].After 1970, the transmission of videos, images and audio became more affordable, and in the past three decades, improvements in internet band switch and widespread usage of portable electronics, digital communication devices and wearables contributed to growing interest in telemedicine [3,4].
the cOviD-19 pandemic led to restrictions in medical care, ushering in the adoption of digital healthcare solutions to overcome social distancing, including teleconsultation [4,5].
the premise of digital consultations is to reduce distances and represent an alternative to in-person consultation, which is valuable for providing healthcare to remote areas, patients with chronic conditions that need regular care, and patients with mobility restrictions [6].
in ophthalmology, the use of ancillary imaging exams poses a great opportunity for telemedicine and artificial intelligence (Ai) algorithm development, with diabetic retinopathy (DR) as the most explored disease.DR is a chronic microvascular ophthalmological complication of diabetes mellitus and remains a leading cause of irreversible blindness among working-aged adults globally [7][8][9].the international council of Ophthalmology recognizes that early diagnosis and treatment are essential to better visual outcomes and recommends a minimum annual dilated ophthalmological exam for every diabetic patient, with the retinal assessment possible through fundus photography [8,10].
Algorithms for DR screening are the most studied and explored field.these models have demonstrated good performance in achieving high sensitivity and specificity, particularly in the detection of even mild cases and vision-threatening diabetic patients [11][12][13][14].
the benefits of teleretinal screening for DR programs for DR have been established and recognized worldwide, with standards and guidelines published by the American telemedicine Association [15].while telemedicine can reduce geographic restrictions, Ai systems have the potential to address the limited healthcare providers' availability, improve workflow and enhance decision-making.Nevertheless, fair and generalizable algorithms and equitable outcomes are challenging for healthcare Ai systems implementation.
this article aims to compare Ai systems applied in DR telemedicine screening studies, currently deployed systems, fairness initiatives and the challenges for implementation.

Methods
this review includes Ai-enabled telemedicine programs focused on DR screening, comparing the article's authors, model technical parameters, metrics results and limitations.For the author's affiliation and the article's population classification, we considered the world Bank country and lending Groups classification in the publication year as high-income, upper-middle, lower-middle (lMic) and low-income countries (lic) [16].two authors (lFN and lZR) performed the article evaluation process.the first screening excluded non-human studies, articles with a language different from english, Portuguese and Spanish, and non-telemedicine and DR articles according to title analysis.the second screening excluded non-telemedicine and DR articles after the abstract review.the articles were grouped as computational models, image quality, economic analysis and teleophthalmology screening descriptions.

Results
the search strategy identified a total of 132 articles.in the first screening, 56 articles were excluded (non-humans, language screening, non-telemedicine and DR), and in the second screening, 59 articles were excluded (abstract review).the remaining 17 articles were eligible for full-text analysis.One editorial, one comment, three review articles and three unrelated to DR and Ai articles were excluded after a full-text analysis (Figure 1).

General analysis
Our review included nine articles published between 2004 and 2020.the articles include telemedicine systems models, algorithms, image quality models and economic analysis.the author's affiliations were from 10 different countries, with the United States (26.67%) and the United Kingdom (13.33%) having the highest representation.we found that 86.67% of the authors were from high-income countries, 6.67% from upper-middle-income countries and 6.67% from lower-middle-income countries.Among the corresponding authors, the majority, specifically 88.89%, hail from high-income nations, while the remaining 11.11% represent lower-middle-income countries.Notably, in four articles, collaborative efforts extended across countries, yet these collaborations exclusively featured authors from high-income countries.the article's populations were from nine different countries, with the United States being the most represented (25%).we found that 93.75% of the population was from high-income countries, while the remaining 6.25% was from lMic.None of the studies were conducted in lic (Figure 2).

Telemedicine systems
Five articles discuss telemedicine systems to assist the DR screening process, published from 2004 to 2019.two articles from Hejlesen et al. andSchneider et al. from 2004 and2005, respectively, presented an internet-based digital communication platform called tOScA. the platform enables data transfer and database construction across england, Germany, ireland and Denmark [17,18].the image analysis routine initiates with image polynomial transformation to enable alignment according to blood vessels, followed by preprocessing and retinal lesions extractions.Next, the image underwent a classification step using supervised algorithms: Bayes classifier, Mahalanobis distance classifier and K Nearest Neighbor classifier.lastly, the platform aims to construct a normative reference database to evaluate algorithms and further research [17].However, no information about the data included for algorithm development or bias control is available.
the article from Joshi and Sivaswamy, in 2011, proposed a web-based DR telescreening framework called Drishticare [19].the proposed platform receives image and clinical information from an external collecting site and performs real-time image quality assessment before data transfer.the server-based prescreening model selects abnormal exams and refers them for specialist evaluation.the images were preprocessed, and the retinal abnormalities were detected and highlighted [19].the Drishticare platform applies automated quality assessment, image screening and lesion detection.the system was evaluated in three primary eye indian hospitals and is currently not deployed in clinical practice.Data and algorithm details and bias assessment are not reported in the platform [19].
the article from Karnowski et al., in 2013, proposed the telemedical Retinal image Analysis and Diagnosis (tRiAD) network for retinal image analysis [20]. the system included an automated quality assessment, vascular, optic nerve, macular structures detection and lesion identification.the quality assessment was done using a support vector machine algorithm, which assigned a quality ranking based on vascular segmentation and measurements in localized neighbourhoods [20].the optic disc and macula detection establish a coordinate system and remove false positives using the local linearity of the vessels, piecewise connectivity and vessel brightness with a Gaussian-like profile [21].the abnormality detection consisted of custom detectors to identify the most prominent lesions, such as microaneurysms, exudates, drusen and other findings [20]. in the tRiAD network, the authors did not describe the dataset used or bias control measures.
the article from Saeed et al., in 2019, evaluated a telemedicine system to detect pathological DR changes in retinal fundus photos [22].the group proposed a cloud-based ophthalmic system applied to a Polish population.the system included image preprocessing by converting the image to grayscale with green channel, histogram stretching, medial filtering and gamma correction.the following step consisted of vascular pattern extraction using vessel segmentation and binarization, with vascular and optic disc removal.lastly, the pathological changes were identified in the image classification step, and the image was classified as healthy if no findings were present.the reported results are from a 100 images dataset with 98% accuracy, 100% sensitivity and 96% specificity in detecting pathological images.the proposed system showed high sensitivity, specificity and accuracy in the validation set; however, no details of the algorithm architecture, included dataset, and bias control are reported [22].

Algorithms
two articles discuss automated algorithms applied for DR screening.the article from Ogunyem et al., in 2014, described a model to predict patients at a high risk of developing DR using clinical variables from 513 patients with type 2 diabetes in South los Angeles, california.the group applied a Bayesian network and radial basis neural network in weka [23] and reported model metrics of 28.5% sensitivity, 93% specificity and an AUc of 0.69 using all variables.the model including the six most important features reported a metric of 26.2% sensitivity, 94.5% specificity and 0.71 AUc [23].the article does not report bias control and is not designed as a teleophthalmological system.
the article from walton et al. evaluated the effectiveness of the intelligent Retinal imaging System (iRiS) in detecting vision-threatening DR and compared its performance to a reading centre interpretation [24].the iRiS was developed in 15,015 patients from the Harris Health System, texas.the reported sensitivity was 66.4%, with a false-negative rate of 2% and a specificity of 72.8% compared to the reading centre interpretation [24].the iRiS is a proprietary system, and details of model development, applied dataset and bias assessment are not provided.

Economic analysis
One article discusses the economic analysis of Ai deployment in DR telescreening. the article from Xie et al. compared the outcome of two deep learning systems: a semi-automated model used as abnormal triage, a fully automated system, and the current human DR assessment.the applied Deep learning System is an ensemble model of three neural networks (vGGNet, ResNet and DenseNet) trained using 76,370 Singaporean images to detect referable DR. the study was carried out and integrated into the Singaporean Diabetic Retinopathy Screening Program [25].
in the study, the semi-automated model screening model was the least expensive, while the human assessment was the most expensive.there was no statistical difference in referred DR patients across the groups, and the analysis was carried out until the presence of DR was detected or excluded.the study estimates a 20% cost savings in DR screening by switching from the current human assessment ($77 per patient per 12 months) to a semi-automated assessment ($62 per 12-month total) and 14.3% by switching to a fully automated assessment ($66 per patient in 12 months) [25].
in the sensitivity analysis, the cost of human graders, screening specificities and information technology costs were the most influential variables.the automated model specificity was the most important factor affecting the cost difference.the lack of generalizability of the findings is a reported limitation, with further research needed to address labour costs, model performance and information technology infrastructure barriers in lMic.

Quality assessment
Quality assessment is one of the steps of ophthalmological telemedical systems.the article from Saha et al., in 2018, evaluates quality assessment methods applied in the triage of retinal fundus photos.the first step consisted of image pre-processing, including pixels cropping and mask application.For the quality screening, the authors applied an AlexNet neural network model trained and evaluated on 7000 images from the eyePAcS dataset relabelled by an ophthalmologist based on acceptance criteria.However, the unbalanced nature of the dataset limits the report of the results, and the article does not provide a bias assessment [11].

Discussion
DR is the leading cause of blindness in work-age adults and the main target for teleretinal screening programs.in our review, we found that although telemedicine and digital healthcare are increasingly implemented even in lMic, there are still few studies that evaluate the integration and deployment of automated systems into DR telescreen programs [26].
the FDA has approved some Ai-assisted DR systems that are currently in clinical use, including iDx-DR, eyeArt and AeYe, which perform quality assessments and detect referable DR for specialist evaluation (table 1).Additionally, they can be integrated with a telemedical system [12][13][14].Among the reviewed systems, none is currently deployed in clinical practice, and there is a lack of bias assessment in data inclusion, preprocessing and modelling steps.the included devices have variable reported sensitivity and specificity for DR screening, which in turn gives rise to concern regarding their reliability and accuracy when deployed in real-world clinical settings.in Ai models are a growing concern in clinical decision software, with unfair predictions made in new data and minorities resulting from unbalanced datasets and non-satisfactory monitoring and re-calibration processes.examples of biased algorithms and inherited data biases are algorithms that miss sepsis diagnosis in underrepresented populations [27,28], underdiagnose melanoma in dark-skin tones [29] and language models that show gender bias [30].
there is a lack of representativeness in the reviewed articles, with no authors coming from lic, and the majority coming from high-income countries.there are also no studies that evaluate lic as the target population, with no representation from South America, Africa and Oceania continents. in lMic countries, the burden of a lack of healthcare professionals and medical specialists is higher, and Ai has the potential to overcome this problem; however, more representativeness is needed.
there is a no social science investigation that appraises the impact of telemedical Ai-assisted programs among the reviewed articles.Social science studies help to understand the unintended social consequences of Ai implementation in each healthcare system.
Although many studies evaluate the costeffectiveness of telemedicine programs, the study from Xie et al. is the first that evaluates the economic consequences of implementing an automatic screening process into teleophthalmological screening programs [25].the study concluded that semi-supervised models are more cost-effective for DR screening in the Singaporean scenario, with graders' cost, screening specificities and it costs as the most influential variables.Additional studies addressing different healthcare and economic scenarios are needed to evaluate the generalizability of Ai implementation in DR screening telehealth.For a cost-effectiveness analysis, the screening uptake rates, grader and information technology labour costs, and model specificity need to be considered according to the healthcare system.

Training and workflow integration
Ophthalmological imaging exams require operator training and basic anatomy and ocular diseases knowledge for good performance.Ai-enabled devices, however, require further additional software use training and how to interpret the results and make decisions.the deployment of algorithms must integrate the existing workflow, provide reliable performance and a positive healthcare experience.However, challenges can arise when implementing Ai systems, such as Google research DR screening algorithm, which demonstrated good performance but failed to integrate into the existing image collection workflow.this led to delays, frustration among healthcare professionals, and unnecessary returns [31,32].constantly training and educating healthcare professionals are also necessary to ensure that they are equipped to work with these complex digital systems.

Economic and infrastructure
Although equipment prices are a possible limiting factor, retinal camera costs become less important for large population-based screening programs.the implementation of Ai systems in screening programs requires additional infrastructure, including internet access, a data transferring platform, and information technology personnel, which can increase exam costs.internet access can be a limiting factor in remote areas, but new satellite internet access providers and retinal cellphone cameras can potentially expand connectivity.
Beyond the initial deployment, post-deployment monitoring and recalibration demand specialized data and healthcare professionals, which can be an additional barrier to access.Furthermore, differences in healthcare systems and countries' economic circumstances can hinder the widespread deployment of Ai-assisted DR screening programs.cost-effectiveness is measured according to a country's annual gross domestic product per capita and can differ according to the healthcare systems.the incremental cost-effectiveness ratio, which measures the difference in cost between intervention and treatment effect, supports the cost-effectiveness of Ai-assisted DR screening; however, few studies evaluate integrating Ai systems in telemedicine screening programs [33][34][35].Furthermore, the absence of systematic national DR screening initiatives in several countries, such as the United States, serves as a hindrance to fully harnessing the advantages of integrating Ai technology into telemedicine screening programs.

Biases
Biases are a growing concern in the development and deployment of Ai systems, with numerous reports of problematic deployments in sepsis diagnosis, cOviD and patient triage [27,36,37].One major source of bias comes from the use of nonrepresentative data in algorithm training, which can result in biased models against underrepresented populations.in ophthalmology, the majority of datasets are from countries, which limits the generalizability of Ai algorithms to other regions [38].to address this problem is crucial to ensure that data used in Ai model development are representative of the target population.comprehensive data analysis during the system development and monitoring and recalibration after deployment are fundamental to ensure fairness and accuracy in real-world settings.
the assessment of adequate results metrics, bias and social impacts in post-deployed Ai systems is yet an unsolved problem in healthcare.Adequate metrics and assessment tools are needed to evaluate the impact of Ai deployments and ensure that Ai does not perpetuate or amplify existing biases in the healthcare systems.

Health equity
the telemedicine premise of digitally exchanging information can be problematic in patients who lack access or are not familiar with technology, as well as in countries with inadequate telemedicine infrastructure.this can create barriers to healthcare, particularly for underrepresented populations and social groups.while telemedicine initiatives have been shown to be particularly beneficial in lMic, where geographic distances and shortage of medical professionals can be challenges, the deployment of Ai models in these countries can exacerbate existing disparities and be dangerous to those populations.
it is crucial to carefully evaluate the potential risks and benefits of deploying Ai systems in healthcare and to make efforts to mitigate the negative impacts on underrepresented populations.this includes addressing the underlying social determinants of health, as well as ongoing monitoring and recalibration of Ai models to ensure that they are accurate and equitable.By prioritizing fairness and accessibility in the development and deployment of Ai systems, we can help to ensure that telemedicine and other technological solutions fulfil their potential to improve healthcare outcomes for all.

Conclusions
telemedicine and Ai hold great promise for augmenting decision-making in medical care, expanding patient access and enhancing cost-effectiveness.DR screening has been one of the leading applications of Ai in ophthalmology, with three US FDA-approved programs that are deployed in the clinical setting.the use of Ai-enabled systems has the potential to streamline and optimize the screening process. in our review, the articles proposed Ai systems for telemedicine, including quality assessment, preprocessing and pathological classification.
Ai systems can reflect the existing biases in healthcare, and promoting fairness and generalizability in automated systems is not straightforward.improving data representativeness, reducing biases and promoting equity in deployment and post-deployment studies are all critical steps in achieving more equitable outcomes.
the lack of deployed telemedicine Ai-assisted programs provides an opportunity to prioritize fair Ai algorithms that promote health equity.comprehensive post-deployment studies that assess not only model biases but also impact and recalibration are needed to ensure that these technologies are deployed in a way that promotes fairness and improves outcomes for every patient.

Disclosure statement
the authors declare that they have no competing interests.

Funding
lFN is a researcher supported by lemann Foundation, instituto da visão-iPePO.

Table 1 .
fdA approved diabetic retinopathy screening devices.